Efficient Histogram Algorithms for NVIDIA CUDA Compatible Devices

نویسندگان

  • Ramtin Shams
  • R. A. Kennedy
چکیده

We present two efficient histogram algorithms designed for NVIDIA’s compute unified device architecture (CUDA) compatible graphics processor units (GPUs). Our algorithm can be used for parallel computation of histograms on large data-sets and for thousands of bins. Traditionally histogram computation has been difficult and inefficient on the GPU. This often means that GPU-based implementation of the algorithms that require histogram calculation as part of their computation, require to transfer data between the GPU and the host memory, which can be a significant bottleneck. Our algorithms remove the need for such costly data transfers by allowing efficient histogram calculation on the GPU. We show that the speed of histogram calculations can be improved by up to 30 times compared to a CPU-based implementation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation with Nvidia CUDA Compatible Devices

In this paper, we propose an acceleration of collapsed variational Bayesian (CVB) inference for latent Dirichlet allocation (LDA) by using Nvidia CUDA compatible devices. While LDA is an efficient Bayesian multi-topic document model, it requires complicated computations for parameter estimation in comparison with other simpler document models, e.g. probabilistic latent semantic indexing, etc. T...

متن کامل

An approach to Improve Particle Swarm Optimization Algorithm Using CUDA

The time consumption in solving computationally heavy problems has always been a concern for computer programmers. Due to simplicity of its implementation, the PSO (Particle Swarm Optimization) is a suitable meta-heuristic algorithm for solving computationally heavy problems. However, despite the simplicity, the algorithm is inefficient for solving real computationally heavy problems but the pr...

متن کامل

CuPy: A NumPy-Compatible Library for NVIDIA GPU Calculations

CuPy 1 is an open-source library with NumPy syntax that increases speed by doing matrix operations on NVIDIA GPUs. It is accelerated with the CUDA platform from NVIDIA and also uses CUDA-related libraries, including cuBLAS, cuDNN, cuRAND, cuSOLVER, cuSPARSE, and NCCL, to make full use of the GPU architecture. CuPy’s interface is highly compatible with NumPy; in most cases it can be used as a dr...

متن کامل

Tracking yarns in high resolution fabric images: a real-time approach for online fabric flaw detection

An algorithmic framework for real-time localization of single yarns within industrial fabric images is presented. The information about precise yarn locations forms the foundation for a fabric flaw detection system which is based on individual yarn measurements. Matching a camera frame rate of 15 fps, we define the term “real-time” by the capability of tracking all yarns within a 5 megapixel im...

متن کامل

Stereoscopic video chroma key processing using NVIDIA CUDA

In this paper, I use the NVIDIA CUDA technology to perform the chroma key algorithm on stereoscopic images. NVIDIA CUDA allows to process parallel algorithms on GPU. Input data are stereoscopic images with the monochromatic background and the destination background image. Output data is the combination of inputs by using the chroma key. I compare the algorithm efficiency between the GPU and CPU...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007